LaTeX macros

\(\def\prob{P}\) \(\def\argmax{\operatorname{arg\,max}}\) \(\def\argmin{\operatorname{arg\,min}}\) \(\def\borel{\operatorname{Borel}}\) \(\def\cE{\cal E}\) \(\def\cP{\cal P}\) \(\def\R{\mathbb{R}}\) \(\def\N{\mathbb{N}}\) \(\def\Z{\mathbb{Z}}\) \(\def\Ee{\operatorname{E}}\) \(\def\va{\text{v.a.}}\) \(\def\var{\operatorname{var}}\) \(\def\cov{\operatorname{cov}}\) \(\def\cor{\operatorname{cor}}\) \(\def\binomdist{\operatorname{Binom}}\) \(\def\berndist{\operatorname{Ber}}\) \(\def\betabinomdist{\operatorname{Beta-Binom}}\) \(\def\betadist{\operatorname{Beta}}\) \(\def\expdist{\operatorname{Exponential}}\) \(\def\gammadist{\operatorname{Gamma}}\) \(\def\hyperdist{\operatorname{Hypergeom}}\) \(\def\hypergeomdist{\operatorname{Hypergeom}}\) \(\DeclareMathOperator{\multinomialdist}{Multinomial}\) \(\DeclareMathOperator{\multinomdist}{Multinom}\) \(\def\poissondist{\operatorname{Poisson}}\) \(\def\geomdist{\operatorname{Geom}}\) \(\def\normaldist{\operatorname{N}}\) \(\def\unifdist{\operatorname{Unif}}\) \(\DeclareMathOperator{\indica}{\mathbb{1}}\) \(\def\CondTo{\mathbin{|\mskip0.5mu}}\) ***

Submissions:

By groups of about three students (meaning: two is OK, four is not advisable but possible. Individual homeworks will also be accepted but collaborative work is preferable).

Please send me an email with the team members names as soon as you have formed it.

Only one copy of each group’s work must be uploaded (by any member).

Full names and email address of all team members must appear in the header.

Format:

A Jupyter or R Markdown notebook, with a header clearly stating the names of all contributors.

Documentation:

Comments in code cells (e.g., meaning of variables, parameters, purpose of functions) are necessary but not sufficient.

You are expected to give full explanations of steps taken in your solution (in Markdown cells), as well as discussion of results and their meaning.

Do not be afraid of being too verbose or too elementary, explain as if to someone learning.

External sources

Getting inspiration from any book, document, blog, web page, even mimicking solutions given in there, is allowed and encouraged, provided you give a proper reference, understand every such material, and explain it in you own words, even more exhaustively.

Do not copy/paste literally large chunks of code I will detect it, believe me, even missing source reference. Bleak consequences.

Deadline:

Completed assignments are due on Monday, April 17. They are to be uploaded to the Virtual Campus.

01 - Stan version of a conjugate prior problem

Modelling Earthquake Waiting Times

Consider the problem in Exponential.02.Earthquake (notebook in 2023-03-27 folder), where the goal is to study earthquake waiting times.

Likelihood is modelled as an \(\expdist(\lambda)\) and \(\lambda\) is given a conjugate prior, \(\lambda\sim\gammadist(\alpha,\beta)\).

In the Exponential.02.Earthquake notebook some simulations are performed for:

  1. Prior pdf for \(\lambda\).
  2. Prior predictive pdf for the waiting time.
  3. Posterior pdf for \(\lambda\).
  4. Posterior predictive for new waiting time.

using known theoretical (analytical) descriptions of these distributions.

Your task is to redo these simulations using Stan (avoiding conjugate prior formulas), then compare your results to the analytical ones.

Use this comparison to tune up adjustable parameters in Stan sampling, such as chain length.

Problem introduction

In 2015, there were 9 significant earthquakes of magnitude 4.0+ in California, occurring on January 4, January 20, January 28, May 22, July 21, July 25, August 17, September 16, and December 30. The waiting times between these earthquakes are modeled using the exponential distribution, with the parameter λ representing the expected waiting time between two earthquakes. The prior distribution for λ is assumed to be a gamma distribution with parameters α and β, where α represents the prior effective sample size and β represents the prior mean waiting time between earthquakes.

Suppose our prior expectation for the waiting time between earthquakes is 1/30 days, i.e., λ=30. We also want to use a prior effective sample size of one interval between earthquakes, which means that α=1. Using these values, we can determine the value of β as β=α/λ=1/30. Therefore, the prior distribution for λ is a gamma distribution with parameters α=1 and β=1/30.

We observe the waiting times between earthquakes to be 16, 8, 114, 60, 4, 23, 30, and 105 days. We exclude the days when no events were observed, which is a total of 4 days. The number of intervals between the earthquakes is 8. Using the observed data and the prior distribution, we can compute the posterior distribution of λ, which provides an updated estimate of the expected waiting time between earthquakes in California.

\[ \begin{array}{lcl} y &= &(16, 8, 114, 60, 4, 23, 30, 105),\\ y &= &(3, 16, 8, 114, 60, 4, 23, 30, 105, 1),\\ y &= &(3, 16, 8, 114, 60, 4, 23, 30, 105). \end{array} \]

#install.packages("rstan", dependencies=TRUE,repos= "https://cloud.r-project.org")
#remove.packages(c("StanHeaders", "rstan"))
#install.packages("StanHeaders", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
#install.packages("rstan", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
library(rstan)
## Loading required package: StanHeaders
## Loading required package: ggplot2
## rstan (Version 2.21.8, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
require(StanHeaders)
library(ggplot2)
rstan_options(auto_write = TRUE)
rstan_options(threads_per_chain = parallel::detectCores())

To plot the analytical predictive distributions, we need to import the actuar package. This is because, as specified in the 03.Exponential.02.Earthquake notebook, the predictive distribution in this case is a Pareto II (Lomax) distribution. That is why we are installing it now below.

#install.packages("actuar")

require(actuar)
## Loading required package: actuar
## 
## Attaching package: 'actuar'
## The following objects are masked from 'package:stats':
## 
##     sd, var
## The following object is masked from 'package:grDevices':
## 
##     cm
library(actuar)

#install.packages("coda")
#install.packages("mvtnorm")

#install.packages("R2jags")
library(rjags)
## Loading required package: coda
## 
## Attaching package: 'coda'
## The following object is masked from 'package:rstan':
## 
##     traceplot
## Linked to JAGS 4.3.0
## Loaded modules: basemod,bugs
require(R2jags)
## Loading required package: R2jags
## 
## Attaching package: 'R2jags'
## The following object is masked from 'package:coda':
## 
##     traceplot
## The following object is masked from 'package:rstan':
## 
##     traceplot
#require(R2jags,quietly=TRUE)

A.1. Simulate from the prior pdf

# retrieving the parameters for Gamma(alpha, beta) distribution, mentioned above.
prior.a<-1 
prior.b<-30

# Obtaining the theoretical prior mean, variance and standard deviation of the Gamma distribution defined in the cell above.
Theor.prior.mean<-prior.a/prior.b
Theor.prior.var<-prior.a/prior.b^2
Theor.prior.sd<-sqrt(Theor.prior.var)
round(Theor.prior.mean,4)
## [1] 0.0333
round(Theor.prior.var,4)
## [1] 0.0011
round(Theor.prior.sd,4)
## [1] 0.0333

A.2. Simulate from the prior predictive pdf

When simulating the prior predictive for the waiting time, there is no need to take any other factors into consideration, as the definition of the prior predictive entails that it is derived solely from the exponential of the prior probability density function for λ, represented as \[ Y_{i}\mskip8mu\text{i.i.d.}\sim\mskip8mu\operatorname{Exponential}(\theta), \] It is important to keep in mind that when generating these simulations, we must refrain from using any information gathered from the observations, as this process exclusively deals with prior functions. Therefore, by solely relying on the prior pdf for lambda, we can effectively simulate the prior predictive distribution for the waiting time without any extraneous variables.

A.3. Simulate from the posterior pdf

For the posterior function, we first initialize vector observations y (see y-values above). The posterior distribution is modeled as a Gamma distribution with shape parameter α’ and rate parameter β’, which are calculated based on the prior parameters α and β and the observed data. Specifically, α’ = α + n and β’ = β + nȳ, where n is the number of observations and ȳ (y_bar) is the sample mean. These parameters will be used to plot analytical functions, but will not be used for simulations. The Gamma distribution is commonly used as a prior for the precision parameter of a normal distribution. (Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian Data Analysis, Third Edition (Chapters 2 and 3). Chapman and Hall/CRC)

y<-c(16, 8, 114, 60, 4, 23, 30, 105)
n<-length(y)
ybar<-mean(y)           # ȳ
# round(ybar,2)
nybar<-sum(y)      

posterior.a<-prior.a+n    # α'
posterior.b<-prior.b+nybar  # β'

# calculating statistics of the posterior distribution
Theor.lambda.post.mean<-posterior.a/posterior.b   
Theor.lambda.post.mode<-(posterior.a-1)/posterior.b  # For alpha>1, 0 for alpha=1.
Theor.lambda.post.var<-posterior.a/posterior.b^2
Theor.lambda.post.sd<-sqrt(Theor.lambda.post.var)

# rounding the values to cut some decimals
round(Theor.lambda.post.mean,4)
## [1] 0.0231
round(Theor.lambda.post.mode,4)
## [1] 0.0205
round(Theor.lambda.post.var,6)
## [1] 5.9e-05
round(Theor.lambda.post.sd,4)
## [1] 0.0077

A.4. Simulate from the posterior predictive pdf

To perform prior simulations in STAN, the only inputs to the model are the prior parameters α and β, which specify the shape and scale of the Gamma distribution used to model the precision parameter λ.

The parameter θ is defined in the model, which is used to model λ, and will serve as a simulation of the prior probability density function (pdf) for λ.

modelString <- "
data {
        real prior_a;
        real prior_b;
    }

parameters {
        real theta;
    }

model {
        theta ~ gamma(prior_a,prior_b);
    }

generated quantities {
        real y_pred;                     
        y_pred = exponential_rng(theta);  // generating simulated samples for the waiting time prediction
    }
    "
# compile
stanDso <- stan_model( model_code=modelString ) 
## Trying to compile a simple C file
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -mmacosx-version-min=10.13 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DBOOST_NO_AUTO_PTR  -include '/Library/Frameworks/R.framework/Versions/4.2/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/usr/local/include   -fPIC  -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.2/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
## namespace Eigen {
##                ^
##                ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.2/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
##          ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
# don't pass any observations to the model!
input_data <- list(prior_a=prior.a, prior_b=prior.b); 
# using 2000 iterations and 1000 warmup iterations since the complexity of the model does not require higher values 
fit <- sampling(stanDso, data = input_data, iter = 2000, chains=1, warmup = 1000, thin = 1)  
## 
## SAMPLING FOR MODEL 'c06d735ea0c11ba865c83955d5617b5e' NOW (CHAIN 1).
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: 
## Chain 1: Gradient evaluation took 2.5e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.25 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.017487 seconds (Warm-up)
## Chain 1:                0.02333 seconds (Sampling)
## Chain 1:                0.040817 seconds (Total)
## Chain 1:
## Warning: There were 552 divergent transitions after warmup. See
## https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#bulk-ess
# extracting the values for the waiting time
pr_theta <- extract(fit, 'theta')
pr_theta <- unlist(pr_theta, use.names=FALSE) # prior lambda
pry_pred <- extract(fit, 'y_pred')
pry_pred <- unlist(pry_pred, use.names=FALSE) # prior predictive waiting time

Simulations on the posterior pdfs

Let’s consider a new model that uses observations to perform simulations on the posterior probability density functions (PDFs). In this model, the number of observations (n) and the observation values (y[n]) are included in the simulations to generate posterior distributions.

modelString <- "
data {
        real prior_a;
        real prior_b;

        int n;        // number of observations
        real y[n];    // observation values
    }

parameters {
        real theta;
    }

model {
        theta ~ gamma(prior_a,prior_b);
        y ~ exponential(theta);
    }

generated quantities {
        real y_pred;
        y_pred = exponential_rng(theta);
    }
    "
input_data <- list(n = n, y = y, prior_a=prior.a, prior_b=prior.b);  # again using the prior paramaters!
fit <- sampling(stanDso, data = input_data, iter = 5000, chains=1, warmup = 2000, thin = 1)
## 
## SAMPLING FOR MODEL 'c06d735ea0c11ba865c83955d5617b5e' NOW (CHAIN 1).
## Chain 1: Rejecting initial value:
## Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 1:   Stan can't start sampling from this initial value.
## Chain 1: 
## Chain 1: Gradient evaluation took 6e-06 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 5000 [  0%]  (Warmup)
## Chain 1: Iteration:  500 / 5000 [ 10%]  (Warmup)
## Chain 1: Iteration: 1000 / 5000 [ 20%]  (Warmup)
## Chain 1: Iteration: 1500 / 5000 [ 30%]  (Warmup)
## Chain 1: Iteration: 2000 / 5000 [ 40%]  (Warmup)
## Chain 1: Iteration: 2001 / 5000 [ 40%]  (Sampling)
## Chain 1: Iteration: 2500 / 5000 [ 50%]  (Sampling)
## Chain 1: Iteration: 3000 / 5000 [ 60%]  (Sampling)
## Chain 1: Iteration: 3500 / 5000 [ 70%]  (Sampling)
## Chain 1: Iteration: 4000 / 5000 [ 80%]  (Sampling)
## Chain 1: Iteration: 4500 / 5000 [ 90%]  (Sampling)
## Chain 1: Iteration: 5000 / 5000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.044495 seconds (Warm-up)
## Chain 1:                0.059034 seconds (Sampling)
## Chain 1:                0.103529 seconds (Total)
## Chain 1:
## Warning: There were 1466 divergent transitions after warmup. See
## https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: Examine the pairs() plot to diagnose sampling problems
# for this model, we also perfom the extractions
pos_theta <- extract(fit, 'theta')
pos_theta <- unlist(pos_theta, use.names=FALSE) # posterior lambda
posy_pred <- extract(fit, 'y_pred')
posy_pred <- unlist(posy_pred, use.names=FALSE) # posterior predictive expecting time

 # Plotting the density for lambda and waiting times

Prior functions

options(repr.plot.width=14,repr.plot.height=7)
old.par<- par(mfrow=c(1,2))

# Simulated prior parameter
plot(density(pr_theta),
     xlab=expression(pr_theta), col="blue",lwd=3,
     main="Lambda", ylim=c(0,30))
# Analytical prior parameter
curve(dgamma(x, prior.a, prior.b), 
      add=TRUE, col="hotpink")
legend(x=0.06,y=20,legend=c("Analytical prior", "Simulated prior"), 
           col=c("hotpink", "blue"), lty=c(1,1), bty="n",lwd=2)
# Simulated prior predictive
plot(density(pry_pred),
     xlab=expression(pry_pred), col="blue",lwd=3,
     main="Waiting Time",xlim=c(0,1000), ylim=c(0,0.015))
# Analytical prior predictive
curve(dpareto(x, prior.a, prior.b), 
      add=TRUE, col="hotpink", lwd=1.5, lty=2,5)
legend(x=300,y=0.010,legend=c("Analytical prior predictive", "Simulated prior predictive"), 
           col=c("hotpink", "blue"), lty=c(2,1), bty="n",lwd=3)

par(old.par)

As we can see from the graphs above, both the analytical prior and simulated prior behave similarly as the curves overlap.

Posterior Distributions

options(repr.plot.width=14,repr.plot.height=7)
old.par<- par(mfrow=c(1,2))

# Simulated prior parameter
plot(density(pos_theta),
     xlab=expression(pos_theta), col="blue",lwd=3,
     main="Lambda", ylim=c(0,30))
# Analytical prior parameter
curve(dgamma(x, prior.a, prior.b), 
      add=TRUE, col="hotpink")
legend(x=0.06,y=20,legend=c("Analytical prior", "Simulated prior"), 
           col=c("hotpink", "blue"), lty=c(1,1), bty="n",lwd=2)
# Simulated prior predictive
plot(density(posy_pred),
     xlab=expression(posy_pred), col="blue",lwd=3,
     main="Waiting Time",xlim=c(0,1000), ylim=c(0,0.016))
# Analytical prior predictive
curve(dpareto(x, prior.a, prior.b), 
      add=TRUE, col="hotpink", lwd=1.5, lty=2,5)
legend(x=300,y=0.010,legend=c("Analytical prior predictive", "Simulated prior predictive"), 
           col=c("hotpink", "blue"), lty=c(2,1), bty="n",lwd=3)

par(old.par)

The graph shows that the analytical and simulated prior behave similarly again, although the analytical prior curve has higher density values in the peak.

Adjusting parameters

As suggested in the assignment, and also being warned by STAN after fitting the simulation, we decided to adjust the chain size. The chain size was 1 at first, but we decided to make it 3. In order to avoid any potential bias caused by the initial starting point and to enable the assessment of convergence, it is necessary to take appropriate measures.

We also increased the warmup from 500 to 1500 iterations.

fit <- sampling(stanDso, data = input_data, iter = 5000, chains=3, warmup = 2000, thin = 1) 
## 
## SAMPLING FOR MODEL 'c06d735ea0c11ba865c83955d5617b5e' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 6e-06 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 5000 [  0%]  (Warmup)
## Chain 1: Iteration:  500 / 5000 [ 10%]  (Warmup)
## Chain 1: Iteration: 1000 / 5000 [ 20%]  (Warmup)
## Chain 1: Iteration: 1500 / 5000 [ 30%]  (Warmup)
## Chain 1: Iteration: 2000 / 5000 [ 40%]  (Warmup)
## Chain 1: Iteration: 2001 / 5000 [ 40%]  (Sampling)
## Chain 1: Iteration: 2500 / 5000 [ 50%]  (Sampling)
## Chain 1: Iteration: 3000 / 5000 [ 60%]  (Sampling)
## Chain 1: Iteration: 3500 / 5000 [ 70%]  (Sampling)
## Chain 1: Iteration: 4000 / 5000 [ 80%]  (Sampling)
## Chain 1: Iteration: 4500 / 5000 [ 90%]  (Sampling)
## Chain 1: Iteration: 5000 / 5000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.033756 seconds (Warm-up)
## Chain 1:                0.099206 seconds (Sampling)
## Chain 1:                0.132962 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'c06d735ea0c11ba865c83955d5617b5e' NOW (CHAIN 2).
## Chain 2: Rejecting initial value:
## Chain 2:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 2:   Stan can't start sampling from this initial value.
## Chain 2: 
## Chain 2: Gradient evaluation took 6e-06 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 5000 [  0%]  (Warmup)
## Chain 2: Iteration:  500 / 5000 [ 10%]  (Warmup)
## Chain 2: Iteration: 1000 / 5000 [ 20%]  (Warmup)
## Chain 2: Iteration: 1500 / 5000 [ 30%]  (Warmup)
## Chain 2: Iteration: 2000 / 5000 [ 40%]  (Warmup)
## Chain 2: Iteration: 2001 / 5000 [ 40%]  (Sampling)
## Chain 2: Iteration: 2500 / 5000 [ 50%]  (Sampling)
## Chain 2: Iteration: 3000 / 5000 [ 60%]  (Sampling)
## Chain 2: Iteration: 3500 / 5000 [ 70%]  (Sampling)
## Chain 2: Iteration: 4000 / 5000 [ 80%]  (Sampling)
## Chain 2: Iteration: 4500 / 5000 [ 90%]  (Sampling)
## Chain 2: Iteration: 5000 / 5000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.045556 seconds (Warm-up)
## Chain 2:                0.146564 seconds (Sampling)
## Chain 2:                0.19212 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'c06d735ea0c11ba865c83955d5617b5e' NOW (CHAIN 3).
## Chain 3: Rejecting initial value:
## Chain 3:   Log probability evaluates to log(0), i.e. negative infinity.
## Chain 3:   Stan can't start sampling from this initial value.
## Chain 3: 
## Chain 3: Gradient evaluation took 6e-06 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 5000 [  0%]  (Warmup)
## Chain 3: Iteration:  500 / 5000 [ 10%]  (Warmup)
## Chain 3: Iteration: 1000 / 5000 [ 20%]  (Warmup)
## Chain 3: Iteration: 1500 / 5000 [ 30%]  (Warmup)
## Chain 3: Iteration: 2000 / 5000 [ 40%]  (Warmup)
## Chain 3: Iteration: 2001 / 5000 [ 40%]  (Sampling)
## Chain 3: Iteration: 2500 / 5000 [ 50%]  (Sampling)
## Chain 3: Iteration: 3000 / 5000 [ 60%]  (Sampling)
## Chain 3: Iteration: 3500 / 5000 [ 70%]  (Sampling)
## Chain 3: Iteration: 4000 / 5000 [ 80%]  (Sampling)
## Chain 3: Iteration: 4500 / 5000 [ 90%]  (Sampling)
## Chain 3: Iteration: 5000 / 5000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.048967 seconds (Warm-up)
## Chain 3:                0.056829 seconds (Sampling)
## Chain 3:                0.105796 seconds (Total)
## Chain 3:
## Warning: There were 4450 divergent transitions after warmup. See
## https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: Examine the pairs() plot to diagnose sampling problems
pos_theta <- extract(fit, 'theta')
pos_theta <- unlist(pos_theta, use.names=FALSE) # posterior lambda
posy_pred <- extract(fit, 'y_pred')
posy_pred <- unlist(posy_pred, use.names=FALSE) # posterior predictive expecting time
options(repr.plot.width=14,repr.plot.height=7)
old.par<- par(mfrow=c(1,2))

options(repr.plot.width=14,repr.plot.height=7)
old.par<- par(mfrow=c(1,2))
# Simulated prior parameter
plot(density(pos_theta),
     xlab=expression(pos_theta), col="blue",lwd=3,
     main="Lambda",ylim=c(0,60))
# Analytical prior parameter
curve(dgamma(x, posterior.a, posterior.b), 
      add=TRUE, col="hotpink")
legend(x=0.08,y=50,legend=c("Analytical posterior", "Simulated posterior"), 
           col=c("hotpink", "blue"), lty=c(1,1), bty="n",lwd=2)
# Simulated prior predictive
plot(density(posy_pred),
     xlab=expression(posy_pred), col="blue",lwd=3,
     main="Waiting Time",xlim=c(0,1000), ylim=c(0,0.020))
# Analytical prior predictive
curve(dpareto(x, posterior.a, posterior.b), 
      add=TRUE, col="hotpink", lwd=1.5, lty=2,5)
legend(x=300,y=0.017,legend=c("Analytical posterior predictive", "Simulated posterior predictive"), 
           col=c("hotpink", "blue"), lty=c(2,1), bty="n",lwd=3)

par(old.par)

Increasing the number of chains and warm up iterations led to a lower density of the simulated posterior and less similarity towards the analytical posterior. A reason for that could be that the model is

02 - A more elaborate mixture prior for the spinning coin

(continued from Diaconis experiment)

On reflection, it was decided that tails had come up more often than heads in the past; further some coins seemed likely to be symmetric.

Thus, a final approximation to the prior was taken as:

\[ 0.50\cdot\betadist(10,20) + 0.20\cdot\betadist(15,15) + 0.30\cdot\betadist(20,10). \]

Same observed data as in the previous model.

# Number of trials
n<-10
# Observed x
x.obs<-3

Perform a complete Bayesian analysis of this model, in close parallel to the first example.

  1. Using the theoretical formulas (prior predictive pmf, posterior pdf, posterior predictive pmf)

  2. Using independent random numbers (rbeta() functions, etc.)

  3. JAGS version

(4)$ {}^{}$ Stan version.

$ ()$ Hint: this one is difficult due to intrinsic limitations in Stan:
Stan does not allow integer parameters thus the JAGS code cannot be translated literally.

As a matter of fact even a Stan version of the two-components prior mixture in Mixture.priors.02.ipynb is tricky.
There are several possible workarounds; try to find one but do not despair if you fail to develop a workable version.

Diaconis and Ylvisaker (1985) compare both mixture conjugate priors with a $ (0,1)$ prior with the data above.

Comparing the MAP estimators, they observe that in a first approximation, they coincide, but spreads do depend on the prior.

They repeat the computations above with a larger sample.

n1<-50
x1.obs<-14

Their conclusion is that with small data, prior matters, but with larger samples, a finely tuned choice of prior is less important.

Mixture PDF

This model represents a mixture of three Beta distribution functions. This enables us to model it as Binomial distribution of size n with probability function being a Beta-Bernoulli distribution. (Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (Vol. 2). CRC press. Chapter 5)

We’ll start by defining the parameters and the functions to compute the mixture prior.

parameters of the mixture prior

prior.alpha1<-10
prior.beta1<-20
prior.alpha2<-15
prior.beta2<-15
prior.alpha3<-20
prior.beta3<-10
prior.gamma1<-0.5
prior.gamma2<-0.2
prior.gamma3<-0.3

Two functions to compute the pdf and cdf of the Betas

mixture.prior.pdf<-function(theta){
    return(prior.gamma1*dbeta(theta,prior.alpha1,prior.beta1)+prior.gamma2*dbeta(theta,prior.alpha2,prior.beta2)+prior.gamma3*dbeta(theta,prior.alpha3,prior.beta3))
}
mixture.prior.cdf<-function(theta){
    return(prior.gamma1*pbeta(theta,prior.alpha1,prior.beta1)+prior.gamma2*pbeta(theta,prior.alpha2,prior.beta2)+prior.gamma3*pbeta(theta,prior.alpha3,prior.beta3))
}

Plot the Mixture pdf

options(repr.plot.width=18,repr.plot.height=7)
par(mfrow=c(1,2))
u<-seq(0,1,length=1000)
v<-mixture.prior.pdf(u)
plot(u,v,ylim=c(0,max(v)*1.05),xlim=c(-0.15,1.15),
     ylab="Density",type="l",lwd=3.5,col="hotpink",main=sprintf("Mixture pdf"))
lines(c(-0.15,0),c(0,0),lwd=3.5,col="hotpink")
lines(c(1,1.15),c(0,0),lwd=3.5,col="hotpink")

# Bayesian analysis of this model

1. Using the theoretical formulas

Set up of the posterior functions

# Calculating posterior alphas and betas
posterior.alpha1<-prior.alpha1+x.obs
posterior.beta1<-prior.beta1+n-x.obs
posterior.alpha2<-prior.alpha2+x.obs
posterior.beta2<-prior.beta2+n-x.obs
posterior.alpha3<-prior.alpha3+x.obs
posterior.beta3<-prior.beta3+n-x.obs

# Calculate the marginals for x, integrating out theta
f1<-choose(n,x.obs)*beta(posterior.alpha1,posterior.beta1)/beta(prior.alpha1,prior.beta1)
f2<-choose(n,x.obs)*beta(posterior.alpha2,posterior.beta2)/beta(prior.alpha2,prior.beta2)
f3<-choose(n,x.obs)*beta(posterior.alpha3,posterior.beta3)/beta(prior.alpha3,prior.beta3)

# Initializing posterior gammas
posterior.gamma1<-prior.gamma1*f1
posterior.gamma2<-prior.gamma2*f2
posterior.gamma3<-prior.gamma3*f3

sum<-posterior.gamma1+posterior.gamma2+posterior.gamma3

# normalizing the posterior gammas
posterior.gamma1<-posterior.gamma1/sum
posterior.gamma2<-posterior.gamma2/sum
posterior.gamma3<-posterior.gamma3/sum

# Initialize pdf and cdf functions for the new mixture
mixture.posterior.pdf<-function(theta){
    return(posterior.gamma1*dbeta(theta,posterior.alpha1,posterior.beta1)+posterior.gamma2*dbeta(theta,posterior.alpha2,posterior.beta2)+posterior.gamma3*dbeta(theta,posterior.alpha3,posterior.beta3))
}

mixture.posterior.cdf<-function(theta){
    return(posterior.gamma1*pbeta(theta,posterior.alpha1,posterior.beta1)+posterior.gamma2*pbeta(theta,posterior.alpha2,posterior.beta2)+posterior.gamma3*pbeta(theta,posterior.alpha3,posterior.beta3))
}

Plot theoretical posterior pdf and MAP estimator of \(θ\)

The maximum of a mixture distribution cannot be computed directly because the mixture distribution is a combination of multiple component distributions, and the maximum of the mixture distribution may not correspond to the maximum of any of its individual components. Therefore, we are computing the maximum of the posterior pdf.

options(repr.plot.width=18,repr.plot.height=7)
par(mfrow=c(1,2))
u<-seq(0,1,length=1000)
v<-mixture.posterior.pdf(u)
plot(u,v,ylim=c(0,max(v)*1.05),xlim=c(-0.15,1.15),
     ylab="Density",type="l",lwd=3.5,col="hotpink",main=sprintf("Mixture posterior pdf"))
lines(c(-0.15,0),c(0,0),lwd=3.5,col="hotpink")
lines(c(1,1.15),c(0,0),lwd=3.5,col="hotpink")


Theor.MAP<-u[which.max(v)]
plot(u,v,ylim=c(0,max(v)*1.05),xlim=c(-0.15,1.15),
     ylab="Density",type="l",lwd=3.5,col="hotpink",main=sprintf("Mixture posterior pdf with the MAP"))
lines(c(-0.15,0),c(0,0),lwd=3.5,col="hotpink")
lines(c(1,1.15),c(0,0),lwd=3.5,col="hotpink")

abline(v=Theor.MAP,col="blue",lwd=4)
map = round(Theor.MAP,3)
name <- paste("MAP value:", map)
legend(x=0.5,y=3.5,legend=c("Mixture posterior pdf",name), 
           col=c("hotpink", "blue"), lty=c(2,1), bty="n",lwd=3)

Posterior Expectation

First, we are calculating the theoretical posterior expectation of a mixture of three Beta distributions.

Theoretical Expectation

Theor.Posterior.Expectation1<-(posterior.alpha1)/(posterior.alpha1+posterior.beta1)
Theor.Posterior.Expectation2<-(posterior.alpha2)/(posterior.alpha2+posterior.beta2)
Theor.Posterior.Expectation3<-(posterior.alpha3)/(posterior.alpha3+posterior.beta3)
Theor.Posterior.Expectation<-posterior.gamma1*Theor.Posterior.Expectation1+
    posterior.gamma2*Theor.Posterior.Expectation2+
    posterior.gamma3*Theor.Posterior.Expectation3

print(paste0("Theoretical Posterior expectation: ",round(Theor.Posterior.Expectation,3)))
## [1] "Theoretical Posterior expectation: 0.36"

Posterior Variance

Now, we are calculating the theoretical posterior variance.

# Calculating the individual posterior variances for each component
Theor.Posterior.Variance1<-(posterior.alpha1*posterior.beta1)/((posterior.alpha1+posterior.beta1)^2*(posterior.alpha1+posterior.beta1+1))
Theor.Posterior.Variance2<-(posterior.alpha2*posterior.beta2)/((posterior.alpha2+posterior.beta2)^2*(posterior.alpha2+posterior.beta2+1))
Theor.Posterior.Variance3<-(posterior.alpha3*posterior.beta3)/((posterior.alpha3+posterior.beta3)^2*(posterior.alpha3+posterior.beta3+1))

Theor.Posterior.Variance<-posterior.gamma1*Theor.Posterior.Variance1+
    posterior.gamma2*Theor.Posterior.Variance2+
    posterior.gamma3*Theor.Posterior.Variance3+
        posterior.gamma1*(Theor.Posterior.Expectation1-Theor.Posterior.Expectation)^2+
        posterior.gamma2*(Theor.Posterior.Expectation2-Theor.Posterior.Expectation)^2+
        posterior.gamma3*(Theor.Posterior.Expectation3-Theor.Posterior.Expectation)^2


print(paste0("Theoretical Posterior variance: ",round(Theor.Posterior.Variance,3)))
## [1] "Theoretical Posterior variance: 0.01"

# 2. Using independent random numbers

Now we will try a different approach using independent numbers. We start with defining our parameters. These are the number of experiments we will perform (n_exp), the number of times each coin is flipped (n_flip) and the number of heads obtained (n_heads).

n_exp <- 20000
n_flip <- 10 
m_heads <- 3

We are generating 20,000 theta values for the problem by sampling from the distribution specified in the problem.

To do this, we will follow two steps. In the first step, we will randomly define the beta distribution from which each coin comes, based on the corresponding probability. In the second step, we will generate a theta value for each coin using the function rbeta.

# Assign from which theta distribution we will pick each coin
#source_beta <- sample(c(1,2,3), replace = TRUE, size = n_exp, prob=c(0.5, 0.2, 0.3), col="lightpink")
source_beta <- sample(c(1,2,3), replace = TRUE, size = n_exp, prob=c(0.5, 0.2, 0.3))


# Generate thetas values picking coins from the corresponding beta distribution
thetas <- c(rbeta(n_exp,10,20)[source_beta == 1], 
            rbeta(n_exp,15,15)[source_beta == 2], 
            rbeta(n_exp,20,10)[source_beta == 3])

hist(thetas, breaks=50)

Subsequently, the simulation will flip each coin ‘n’ times (defined above as 10), following a binomial distribution with each coin’s assigned theta value as the probability parameter. The resulting frequency of heads will be recorded, and a prior predictive probability mass function will be generated. The ensuing cell will display a table containing the head frequency data and the pmf.

# Number of heads in the trials
heads <- rbinom(n = n_exp, size = n_flip, prob = thetas)

# Absolute frequencies
print("Absolute frequency of m heads:")
## [1] "Absolute frequency of m heads:"
table(heads)
## heads
##    0    1    2    3    4    5    6    7    8    9   10 
##  342 1214 2138 2995 3178 2995 2676 2154 1432  702  174
# Relative frequencies. 
print("Relative frequency of m heads:")
## [1] "Relative frequency of m heads:"
f <- table(heads)/sum(table(heads))
f
## heads
##       0       1       2       3       4       5       6       7       8       9 
## 0.01710 0.06070 0.10690 0.14975 0.15890 0.14975 0.13380 0.10770 0.07160 0.03510 
##      10 
## 0.00870

Now we plot the values for the simulated prior predictive pmf

matplot(x=names(f),y=f,type="h",lwd=7, lty=1,
        xlab="x",ylab="Rel. Freqs..",
        main="Simulated prior predictive pmf")

With the ultimate objective of scrutinizing the likelihood of obtaining exactly n=3 heads, the cases resulting in such an outcome are selectively chosen. These chosen cases are then used to construct a histogram of thetas that give rise to the aforementioned occurrence.

# m=3 heads
m.heads.idx <- heads == m_heads

# Proportion m=3 heads
print(paste0("Proportion of samples with m=3 heads: ", sum(m.heads.idx)/length(m.heads.idx)))
## [1] "Proportion of samples with m=3 heads: 0.14975"
# Thetas m=3 heads
thetas.m.heads <- thetas[m.heads.idx]

# Theta values generating m=3 heads
hist(thetas.m.heads, breaks=50, freq=FALSE, main = "Histogram of thetas producing m heads", col="lightpink")

Analysis of the posterior quantities, MAP estimator of \(θ\)

As done above, we will compute the simulated MAP as the \(θ\) value with maximal density.

thetas.m.heads.density <-density(thetas.m.heads)
Sim.MAP <- thetas.m.heads.density$x[which.max(thetas.m.heads.density$y)]
map = round(Theor.MAP,3)
name <- paste("Simulated MAP value:", map)
# Plot over histogram
hist(thetas.m.heads, breaks=50, freq=FALSE, main="Histogram of thetas producing m=3 heads", col="lightpink")
abline(v=Sim.MAP,col="DarkRed",lwd=4)
lines(thetas.m.heads.density$x,thetas.m.heads.density$y,lwd=2.5,col="blue")

legend(x=0.6,y=3,legend=c("Theta density",name), 
           col=c("blue","DarkRed"), lty=c(2,1), bty="n",lwd=3)

Posterior expectation, variance and quantiles

The simulation offers the possibility to calculate the posterior expectation, as demonstrated in the following code block.

# Simulated Expectation 
print(paste0("Simulation based posterior expectation: ",round(mean(thetas.m.heads),3)))
## [1] "Simulation based posterior expectation: 0.359"
# Simulated Variance
print(paste0("Simulation based posterior variance: ",round(var(thetas.m.heads),3)))
## [1] "Simulation based posterior variance: 0.01"
# Simulated Quantiles
print("Simulation based posterior quantiles: ")
## [1] "Simulation based posterior quantiles: "
quantiles <- quantile(thetas.m.heads,c(0,0.25,0.50,0.75,1))
round(quantiles,4)
##     0%    25%    50%    75%   100% 
## 0.0956 0.2890 0.3436 0.4193 0.7717

# 3. JAGS version

Now, we are going to be conducting the same analysis using JAGS and we will install the corresponding packages.

#install.packages("R2jags",dependencies=TRUE,repos= "https://cloud.r-project.org")

require(R2jags)

Loading our data that is gonna be used in the Jags model, which is also defined below.

Mix.01.input_data<-list(a1=prior.alpha1,b1=prior.beta1,a2=prior.alpha2,b2=prior.beta2,b3=prior.beta3,a3=prior.alpha3
                 ,gamma1=prior.gamma1,gamma2=prior.gamma2,gamma3=prior.gamma3,n=n,x=x.obs)

JAGS model

cat(
"model
    {
    x~dbin(p,n)            
    p<-theta[r]
    r~dcat(g[])
    theta[1]~dbeta(a1,b1) 
    theta[2]~dbeta(a2,b2)
    theta[3]~dbeta(a3,b3)
    g[1]<-gamma1
    g[2]<-gamma2
    g[3]<-gamma3
    }"
    ,file="Mix.01.jag")

These variables are important to control the behavior of the MCMC algorithm and ensure accurate and efficient estimation of the posterior distribution.

#Mix.01.m1<-jags(data=Mix.01.data, n.chains=4,n.iter=3500,n.burnin=500,
Mix.01.m1<-jags(data=Mix.01.input_data, n.chains=4,n.iter=3500,n.burnin=500, 
        parameters.to.save=c("theta","p"), model.file="Mix.01.jag")
## module glm loaded
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 1
##    Unobserved stochastic nodes: 4
##    Total graph size: 17
## 
## Initializing model

View of some statistics of the JAGS model

print(Mix.01.m1)
## Inference for Bugs model at "Mix.01.jag", fit using jags,
##  4 chains, each with 3500 iterations (first 500 discarded), n.thin = 3
##  n.sims = 4000 iterations saved
##          mu.vect sd.vect  2.5%   25%   50%   75% 97.5%  Rhat n.eff
## p          0.361   0.101 0.202 0.290 0.347 0.417 0.599 1.001  3200
## theta[1]   0.327   0.075 0.189 0.274 0.324 0.376 0.484 1.001  4000
## theta[2]   0.491   0.089 0.320 0.429 0.490 0.554 0.662 1.001  4000
## theta[3]   0.663   0.086 0.489 0.605 0.665 0.724 0.821 1.001  4000
## deviance   3.239   1.032 2.643 2.686 2.843 3.293 6.292 1.001  4000
## 
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
## 
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 0.5 and DIC = 3.8
## DIC is an estimate of expected predictive error (lower deviance is better).

Use traceplots to examine the chains for evidence of stationarity

After that, we can exhibit the traceplot. The traceplot presents a graph of the sampled values for each variable in the chain as iterations progress. Each variable has a separate plot.

options(repr.plot.width=15,repr.plot.height=5)
traceplot(Mix.01.m1)

Ultimately, we can obtain a sample and generate a plot of the estimated posterior probability density function (PDF) based on that sample.

p.sample<-Mix.01.m1$BUGSoutput$sims.list$p
p.sample.density<-density(p.sample)
plot(p.sample.density,lwd=2.5,col="lightpink",main=expression(paste("Estimated Posterior PDF")),cex.main=1.6)

# Analysis of posterior quantities

Based on the results of these experiments, we can proceed with analyzing the posterior quantities. Here we calculate the MAP as the theta value with maximal density as done previously.

# Finding MAP
thetas.m.heads<-p.sample
thetas.m.heads.density <-density(p.sample)
Sim.MAP <- thetas.m.heads.density$x[which.max(thetas.m.heads.density$y)]

# Plot over histogram
hist(thetas.m.heads, breaks=50, freq=FALSE, main="Histogram of thetas producing m heads")
abline(v=Sim.MAP,col="DarkRed",lwd=4)
#lines(thetas.m.heads.density,y,lwd=2.5,col="DarkGreen")
text(0.42, 0.5, "Simulated MAP", col="DarkRed")

print(paste0("Simulated MAP: ", round(Sim.MAP,3)))
## [1] "Simulated MAP: 0.331"

Expectations, variance and quantiles

# Simulated Expectation
print(paste0("Simulation based posterior expectation:",round(mean(thetas.m.heads),3)))
## [1] "Simulation based posterior expectation:0.361"
# Simulated Variance
print(paste0("Simulation based posterior variance:",round(var(thetas.m.heads),3)))
## [1] "Simulation based posterior variance:0.01"
# Simulated Quantiles
print("Simulation based posterior quantiles:")
## [1] "Simulation based posterior quantiles:"
quantiles <- quantile(thetas.m.heads,c(0,0.25,0.50,0.75,1))
round(quantiles,4)
##     0%    25%    50%    75%   100% 
## 0.1351 0.2898 0.3473 0.4174 0.7939

# 4. Using STAN

modelString = "
    data{
        int<lower=0> n ;
        int<lower=0> x ; 
        real<lower=0> a1 ;
        real<lower=0> b1 ;
        real<lower=0> a2 ;
        real<lower=0> b2 ;
        real<lower=0> a3 ;
        real<lower=0> b3 ;
        real<lower=0,upper=1>  gamma1 ;
        real<lower=0,upper=1>  gamma2 ;
        real<lower=0,upper=1>  gamma3 ;
        real<lower=0,upper=1> cumgamma1;
        real<lower=0,upper=1> cumgamma2;
        }
    parameters{
        real<lower=0,upper=1>  u ;
        vector<lower=0,upper=1>[2] theta ;
        }
    transformed parameters{                         
        real<lower=0,upper=1> p ;                   
                                                    // Here the trick:
                                                    // Generate on the spot the unnamed index selecting 
                                                    // which of both components in the mixture is chosen
        p = theta[u < cumgamma1 ? 1 : (u < cumgamma2 ? 2 : 3)];
        }
    model{
        x ~ binomial(n,p) ;
        theta[1]~beta(a1,b1) ;
        theta[2]~beta(a2,b2) ;
        theta[3]~beta(a3,b3) ;
        u ~ uniform(0,1) ; 
        }"
modelString = "
    data{
        int<lower=0> n;
        int<lower=0> x; 
        real<lower=0> a1;
        real<lower=0> b1;
        real<lower=0> a2;
        real<lower=0> b2;
        real<lower=0> a3;
        real<lower=0> b3;
        real<lower=0, upper=1> gamma1;
        real<lower=0, upper=1> gamma2;
        real<lower=0, upper=1> gamma3;
        real<lower=0, upper=1> cumgamma1;
        real<lower=0, upper=1> cumgamma2;
    }
    parameters{
        real<lower=0, upper=1> u;
        simplex[3] theta; // size of theta is 3
    }
    transformed parameters{
        real<lower=0, upper=1> p;
        p = theta[1] * (u < cumgamma1) + theta[2] * (cumgamma1 <= u && u < cumgamma2) + theta[3] * (u >= cumgamma2);
    }
    model{
        x ~ binomial(n, p);
        theta[1] ~ beta(a1, b1);
        theta[2] ~ beta(a2, b2);
        theta[3] ~ beta(a3, b3);
        u ~ uniform(0, 1); 
    }
"
# Set initial values for theta
init_theta <- list(list(theta=c(0.5, 0.5)), list(theta=c(0.4, 0.6)), list(theta=c(0.6, 0.4)))
stanDso <- stan_model( model_code=modelString )
## Trying to compile a simple C file
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## clang -mmacosx-version-min=10.13 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.2/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DBOOST_NO_AUTO_PTR  -include '/Library/Frameworks/R.framework/Versions/4.2/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/usr/local/include   -fPIC  -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.2/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/Eigen/Core:88:
## /Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
## namespace Eigen {
## ^
## /Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
## namespace Eigen {
##                ^
##                ;
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.2/Resources/library/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13:
## In file included from /Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/Eigen/Dense:1:
## /Library/Frameworks/R.framework/Versions/4.2/Resources/library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
## #include <complex>
##          ^~~~~~~~~
## 3 errors generated.
## make: *** [foo.o] Error 1
cumgamma1 <- prior.gamma1 / (prior.gamma1 + prior.gamma2 + prior.gamma3)

cumgamma2 <- (prior.gamma1 + prior.gamma2) / (prior.gamma1 + prior.gamma2 + prior.gamma3)

Mix.01.Standat <- list(a1=prior.alpha1, b1=prior.beta1, a2=prior.alpha2, b2=prior.beta2, b3=prior.beta3, a3=prior.alpha3,
                       gamma1=prior.gamma1, gamma2=prior.gamma2, gamma3=prior.gamma3, n=n, x=x.obs,
                       cumgamma1=cumgamma1, cumgamma2=cumgamma2)

# Generate posterior sample:
#stanFit <- sampling( object=stanDso, 
 #                    data = Mix.01.Standat, 
 #                    chains = 3,
  #                   iter = 4000, 
   #                  warmup = 200, 
    #                 thin = 1)

# Generate posterior sample with custom initial values
stanFit <- sampling( object=stanDso, 
                     data = Mix.01.Standat, 
                     chains = 3,
                     iter = 4000, 
                     warmup = 200, 
                     thin = 1,
                     init_r = 0.1)
## 
## SAMPLING FOR MODEL '1ad964083ea7ea4ad7d06c3a4d656b06' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 2.7e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.27 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 4000 [  0%]  (Warmup)
## Chain 1: Iteration:  201 / 4000 [  5%]  (Sampling)
## Chain 1: Iteration:  600 / 4000 [ 15%]  (Sampling)
## Chain 1: Iteration: 1000 / 4000 [ 25%]  (Sampling)
## Chain 1: Iteration: 1400 / 4000 [ 35%]  (Sampling)
## Chain 1: Iteration: 1800 / 4000 [ 45%]  (Sampling)
## Chain 1: Iteration: 2200 / 4000 [ 55%]  (Sampling)
## Chain 1: Iteration: 2600 / 4000 [ 65%]  (Sampling)
## Chain 1: Iteration: 3000 / 4000 [ 75%]  (Sampling)
## Chain 1: Iteration: 3400 / 4000 [ 85%]  (Sampling)
## Chain 1: Iteration: 3800 / 4000 [ 95%]  (Sampling)
## Chain 1: Iteration: 4000 / 4000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.008527 seconds (Warm-up)
## Chain 1:                0.096938 seconds (Sampling)
## Chain 1:                0.105465 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '1ad964083ea7ea4ad7d06c3a4d656b06' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 8e-06 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 4000 [  0%]  (Warmup)
## Chain 2: Iteration:  201 / 4000 [  5%]  (Sampling)
## Chain 2: Iteration:  600 / 4000 [ 15%]  (Sampling)
## Chain 2: Iteration: 1000 / 4000 [ 25%]  (Sampling)
## Chain 2: Iteration: 1400 / 4000 [ 35%]  (Sampling)
## Chain 2: Iteration: 1800 / 4000 [ 45%]  (Sampling)
## Chain 2: Iteration: 2200 / 4000 [ 55%]  (Sampling)
## Chain 2: Iteration: 2600 / 4000 [ 65%]  (Sampling)
## Chain 2: Iteration: 3000 / 4000 [ 75%]  (Sampling)
## Chain 2: Iteration: 3400 / 4000 [ 85%]  (Sampling)
## Chain 2: Iteration: 3800 / 4000 [ 95%]  (Sampling)
## Chain 2: Iteration: 4000 / 4000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.008755 seconds (Warm-up)
## Chain 2:                0.109217 seconds (Sampling)
## Chain 2:                0.117972 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL '1ad964083ea7ea4ad7d06c3a4d656b06' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 9e-06 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 4000 [  0%]  (Warmup)
## Chain 3: Iteration:  201 / 4000 [  5%]  (Sampling)
## Chain 3: Iteration:  600 / 4000 [ 15%]  (Sampling)
## Chain 3: Iteration: 1000 / 4000 [ 25%]  (Sampling)
## Chain 3: Iteration: 1400 / 4000 [ 35%]  (Sampling)
## Chain 3: Iteration: 1800 / 4000 [ 45%]  (Sampling)
## Chain 3: Iteration: 2200 / 4000 [ 55%]  (Sampling)
## Chain 3: Iteration: 2600 / 4000 [ 65%]  (Sampling)
## Chain 3: Iteration: 3000 / 4000 [ 75%]  (Sampling)
## Chain 3: Iteration: 3400 / 4000 [ 85%]  (Sampling)
## Chain 3: Iteration: 3800 / 4000 [ 95%]  (Sampling)
## Chain 3: Iteration: 4000 / 4000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.008721 seconds (Warm-up)
## Chain 3:                0.110055 seconds (Sampling)
## Chain 3:                0.118776 seconds (Total)
## Chain 3: